17 research outputs found
Absolute Convergence of Rational Series is Semi-decidable
International audienceWe study \emph{real-valued absolutely convergent rational series}, i.e. functions , defined over a free monoid , that can be computed by a multiplicity automaton and such that . We prove that any absolutely convergent rational series can be computed by a multiplicity automaton which has the property that is simply convergent, where is the series computed by the automaton derived from by taking the absolute values of all its parameters. Then, we prove that the set composed of all absolutely convergent rational series is semi-decidable and we show that the sum can be estimated to any accuracy rate for any . We also introduce a spectral radius-like parameter which satisfies the following property: is absolutely convergent iff
Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots
International audiencePredicting the folding of an RNA sequence, while allowing general pseudoknots (PK), consists in finding a minimal free-energy matching of its positions. Assuming independently contributing base-pairs, the problem can be solved in -time using a variant of the maximal weighted matching. By contrast, the problem was previously proven NP-Hard in the more realistic nearest-neighbor energy model. In this work, we consider an intermediate model, called the stacking-pairs energy model. We extend a result by Lyngs\o, showing that RNA folding with PK is NP-Hard within a large class of parametrization for the model. We also show the approximability of the problem, by giving a practical algorithm that achieves at least a -approximation for any parametrization of the stacking model. This contrasts nicely with the nearest-neighbor version of the problem, which we prove cannot be approximated within any positive ratio, unless .La prédiction du repliement, avec pseudonoeuds généraux, d'une séquence d'ARN de taille est équivalent à la recherche d'un couplage d'énergie libre minimale. Dans un modèle d'énergie simple, où chaque paire de base contribue indépendamment à l'énergie, ce problème peut être résolu en temps grâce à une variante d'un algorithme de couplage pondéré maximal. Cependant, le même problème a été démontré NP-difficile dans le modèle d'énergie dit des plus proches voisins. Dans ce travail, nous étudions les propriétés du problème sous un modèle d'empilements, constituant un modèle intermédiaire entre ceux d'appariement et des plus proches voisins. Nous démontrons tout d'abord que le repliement avec pseudo-noeuds de l'ARN reste NP-difficile dans de nombreuses valuations du modèle d'énergie. . Par ailleurs, nous montrons que ce problème est approximable, en proposant un algorithme polynomial garantissant une -approximation. Ce résultat illustre une différence essentielle entre ce modèle et celui des plus proches voisins, pour lequel nous montrons qu'il ne peut être approché à aucun ratio positif par un algorithme en temps polynomial sauf si
Inapproximability of maximal strip recovery
In comparative genomic, the first step of sequence analysis is usually to
decompose two or more genomes into syntenic blocks that are segments of
homologous chromosomes. For the reliable recovery of syntenic blocks, noise and
ambiguities in the genomic maps need to be removed first. Maximal Strip
Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff
for reliably recovering syntenic blocks from genomic maps in the midst of noise
and ambiguities. Given genomic maps as sequences of gene markers, the
objective of \msr{d} is to find subsequences, one subsequence of each
genomic map, such that the total length of syntenic blocks in these
subsequences is maximized. For any constant , a polynomial-time
2d-approximation for \msr{d} was previously known. In this paper, we show that
for any , \msr{d} is APX-hard, even for the most basic version of the
problem in which all gene markers are distinct and appear in positive
orientation in each genomic map. Moreover, we provide the first explicit lower
bounds on approximating \msr{d} for all . In particular, we show that
\msr{d} is NP-hard to approximate within . From the other
direction, we show that the previous 2d-approximation for \msr{d} can be
optimized into a polynomial-time algorithm even if is not a constant but is
part of the input. We then extend our inapproximability results to several
related problems including \cmsr{d}, \gapmsr{\delta}{d}, and
\gapcmsr{\delta}{d}.Comment: A preliminary version of this paper appeared in two parts in the
Proceedings of the 20th International Symposium on Algorithms and Computation
(ISAAC 2009) and the Proceedings of the 4th International Frontiers of
Algorithmics Workshop (FAW 2010
A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms
We extend an hypergraph representation, introduced by Finkelstein and
Roytberg, to unify dynamic programming algorithms in the context of RNA folding
with pseudoknots. Classic applications of RNA dynamic programming energy
minimization, partition function, base-pair probabilities...) are reformulated
within this framework, giving rise to very simple algorithms. This
reformulation allows one to conceptually detach the conformation space/energy
model -- captured by the hypergraph model -- from the specific application,
assuming unambiguity of the decomposition. To ensure the latter property, we
propose a new combinatorial methodology based on generating functions. We
extend the set of generic applications by proposing an exact algorithm for
extracting generalized moments in weighted distribution, generalizing a prior
contribution by Miklos and al. Finally, we illustrate our full-fledged
programme on three exemplary conformation spaces (secondary structures,
Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets
of algorithms that are either novel or have complexity comparable to classic
implementations for minimization and Boltzmann ensemble applications of dynamic
programming
Metrics and similarity measures for hidden markow models
Hidden Markov models were introduced in the beginning of the 1970's as a tool in speech recognition. During the last decade they have been found useful in addressing problems in computational biology such as characterising sequence families, gene nding, structure prediction and phylogenetic analysis. In this paper we propose several measures between hidden Markov models. We give an ecient algorithm that computes the measures for left-right models, e.g. prole hidden Markov models, and briey discuss how to extend the algorithm to other types of models. We present an experiment using the measures to compare hidden Markov models for three classes of signal peptides. Introduction A hidden Markov model describes a probability distribution over a potentially innite set of sequences. It is convenient to think of a hidden Markov model as generating a sequence according to some probability distribution by following a rst order Markov chain of states, called the path, from a sp..
Approximating the 2-Interval Pattern problem
We address the problem of approximating the 2-Interval Pattern problem over its various models and restrictions. This problem, which is motivated by RNA secondary structure prediction, asks to find a maximum cardinality subset of a 2-interval set with respect to some prespecified model. For each such model, we give varying approximation quality depending on the different possible restrictions imposed on the input 2-interval set
Predicting RNA Secondary Structures: One-grammar-fits-all Solution
LNCS v. 9096 entitled: Bioinformatics Research and Applications: 11th International Symposium, ISBRA 2015 Norfolk, USA, June 7-10, 2015 ProceedingsRNA secondary structures are known to be important in many biological processes. Many available programs have been developed for RNA secondary structure prediction. Based on our knowledge, however, there still exist secondary structures of known RNA sequences which cannot be covered by these algorithms. In this paper, we provide an efficient algorithm that can handle all RNA secondary structures found in Rfam database. We designed a new stochastic context-free grammar named Rectangle Tree Grammar (RTG) which significantly expands the classes of structures that can be modelled. Our algorithm runs in O(n 6) time and the accuracy is reasonably high, with average PPV and sensitivity over 75%. In addition, the structures that RTG predicts are very similar to the real ones
Learning Stochastic Finite Automata
Abstract. Stochastic deterministic finite automata have been introduced and are used in a variety of settings. We report here a number of results concerning the learnability of these finite state machines. In the setting of identification in the limit with probability one, we prove that stochastic deterministic finite automata cannot be identified from only a polynomial quantity of data. If concerned with approximation results, they become Pac-learnable if the L ∞ norm is used. We also investigate queries that are sufficient for the class to be learnable.